Compositional Analysis of Complex Mixtures using Automatic MicroED Data Collection

Abstract Quantitative analysis of complex mixtures, including compounds having similar chemical properties, is demonstrated using an automatic and high throughput approach to microcrystal electron diffraction (MicroED). Compositional analysis of organic and inorganic compounds can be accurately executed without the need of diffraction standards. Additionally, with sufficient statistics, small amounts of compounds in mixtures can be reliably detected. These compounds can be distinguished by their crystal structure properties prior to structure solution. In addition, if the crystals are of good quality, the crystal structures can be generated on the fly, providing a complete analysis of the sample. MicroED is an effective method for analyzing the structural properties of sub‐micron crystals, which are frequently found in small‐molecule powders. By developing and using an automatic and high throughput approach to MicroED, and with the use of SerialEM for data collection, data from thousands of crystals allow sufficient statistics to detect even small amounts of compounds reliably.


Additional Text Automatic high throughput MicroED Data Collection
Although automation often requires a significant amount of software development, the approach used in this work utilized widely available software, such as SerialEM and standard X-ray crystallographic processing software.The method involved collecting an atlas montage of the entire grid, followed by medium-magnification montages of the most promising grid squares (Figure 1 in the main article).The eucentric height was automatically determined for each grid square and stored with the medium montage maps.The crystal positions were accurately determined from the montage maps and stored as a list of points in SerialEM.Each point in the list was queued for data collection, applying a SerialEM macro to each point.Typically, each grid square contained 10 crystals of the appropriate size and thickness, and about 150 medium magnification montages were needed to assemble a data collection of 1500 crystals.A typical data set of 50° continuous rotation collected at 2°/s took approximately one minute to collect on the Talos Arctica and over 1000 complete data sets were collected automatically overnight.
We developed a Python (to be published) script to process data generated from the automatic approach in real time.This script was designed to keep up with the speed of data acquisition and to customize the processing for MicroED data recorded using the Falcon detector and the Talos Arctica.The pipeline includes converting the mrc image file format to smv format (using software that is freely available on our lab website https://cryoem.ucla.edu/microed),and running XDS, XSCALE, and XDSCONV to process, merge, and convert the data to SHELX hkl file format.The pipeline uses a brute force approach to evaluate a combination of frequently used input parameters as the best statistics are usually not found with the same XDS input parameters.The best solution is selected based on the results of XSCALE, and the pipeline systematically explores combinations of data sets to merge, and finally uses a scoring function to select the top merged data sets which are suggested to the user.This approach resulted in a robust pipeline to process MicroED data from various samples without the need to select input parameters prior to processing or assuming similarity among data sets.

Using high throughput MicroED for compositional Analysis
Once the data was collected and processed, we used the unit cell parameters for chemical identification (Table S6).Using the unit cell parameters to identify the crystal structures has several advantages over solving structures for all data sets.First, it allows for shorter rotation wedges to be collected for each data set, increasing the number of data sets that can be collected within a set time.Second, the unit cell parameters are easier to extract than solving structures for all data sets.This is particularly advantageous when collecting data on a large number of crystals that vary in size and diffraction quality, as some data sets may not be good enough for proper structure determination but sufficiently good for accurate unit cell and symmetry determination.Finally, using unit cell parameters to identify structures results in a higher accuracy of assignments, which is crucial for the identification of compounds present in small amounts in the mixture.Overall, this approach allowed for the successful identification of all salts, including those present in the lowest amounts in the mixture.
To validate the approach, we performed an analysis to assess the risk of incorrectly assigning a dataset to the wrong crystal structure based on similarities in the unit cell parameters.We established narrow windows around the published values for the unit cell dimensions and angles, allowing for deviations of up to 1Å and 10°, respectively.These restrictions were compared with the actual distributions of unit cell parameters in the sample mixtures, and the results showed that the actual parameters fell well within the allowed values (Figures S1 and S2).Furthermore, we found no overlap in the absolute values and combinations of unit cell parameters for any two crystal forms, indicating that the proper crystal structure could be unambiguously determined for all the compounds involved in the experiments.Overall, the analysis confirmed that the unit cell parameters were unique and easy-to-compare properties for the crystal structures, and accurately identified the compounds in the mixtures.
In addition, to validate the crystal selection process, the diffraction quality of the selected crystals was analyzed.Moreover, the refined R1 factors reflect an overall excellent MicroED data quality with no significant difference between the compounds.These results suggest that the crystal selection process did not introduce significant bias in the diffraction quality or thermal motion of the selected crystals.For each crystal, the volume of the diffracting part was estimated by multiplying the area of the crystal by a thickness factor that accounts for the variation in crystal thickness.This thickness factor was calibrated by comparing the electron transmission through several crystals of known thickness, and it was assumed to be constant for all crystals within the selected size range.The estimated crystal volumes were found to be similar across all compounds, indicating that the sample preparation process did not result in a bias towards a particular compound due to differences in crystal hardness or brittleness.

Data processing
To handle the thousands of datasets produced in these MicroED experiments, an automated approach for data analysis was necessary.In addition to the commonly used pipelines for X-ray structure determination, semi-automated MicroED focused systems have been published, but none are fully automatic.For this study a new Python script has been developed for an optimal treatment of MicroED data.The script runs all the steps from image conversion through integration to small molecule structure determination and at the same time optimizes input parameters iteratively.MicroED data requires slightly different input parameters compared to X-ray data due to the different setup.Typically, the X-ray beam generated by an insertion device at a synchrotron storage ring takes a rather long and straight path only bent by mirrors for the purposes of focusing the beam and is inherently parallel by the long distance between the source and the sample.In an electron microscope the electrons are easily bent by electronic lenses which are used to focus, shape, tilt and size, making the beam parallel as well as compensating for deviations introduced by previous lens imperfections.As a result, the exact position of a lattice point in the 3-dimensional reciprocal lattice, and therefore also its reflection onto the detector surface, is affected by the electron beam alignment to a larger extent than for X-ray or neutrons.Therefore, a more generous expectation in the errors of the lattice should be introduced to processing software like XDS.

Figures
Figure S1 Example of crystals from Mixture A which appears to have "melted", from which there was a lower degree of successfully indexed data set.There are visually two areas of different contrast, a darker particle shaped inner area surrounded by a lighter drop-like outer area.It seems likely that the crystals partially decomposed before being frozen in the microscope resulting in poor diffraction.

Figure S2
Figure S2 Compound identification in Mixture A. The bars represent the number of crystals identified for each composition in mixture A. Despite the smaller number of data sets used, 913 in total, all compounds could be identified including the compounds with the lowest relative mass of 3% of the total weight of the composition.

Figure S3
Figure S3Unit cell parameters from processing in mixture B showing unit cell parameters plotted as red dots, mean and standard deviations plotted as black lines, and the tolerance bars are shown by blue and green boxes.The experimental unit cell parameters fall well within the selected tolerance limits which ensure an accurate unit cell determination.There is no overlap of the six unit cell parameters between the space groups and the structure assignments were conclusive.

Figure S4
Figure S4Unit cell parameters from processing in mixture C showing unit cell parameters plotted as red dots, mean and standard deviations plotted as black lines, and the tolerance bars are shown by blue and green boxes.The experimental unit cell parameters fall well within the selected tolerance limits which ensure an accurate unit cell determination.There is no overlap of the six unit cell parameters between the space groups and the structure assignments were all conclusive.

Figure S5
Figure S5Unit cell parameters from processing in aspirin tablet showing unit cell parameters plotted as red dots, mean and standard deviations plotted as black lines, and the tolerance bars are shown by blue and green boxes.The experimental unit cell parameters fall well within the selected tolerance limits which ensure an accurate unit cell determination.There is no overlap of the six unit cell parameters between the space groups and the structure assignments were all conclusive.

Figure S6
Figure S6Unit cell parameters from processing in acetaminophen tablet showing unit cell parameters plotted as red dots, mean and standard deviations plotted as black lines, and the tolerance bars are shown by blue and green boxes.The experimental unit cell parameters fall well within the selected tolerance limits which ensure an accurate unit cell determination.There is no overlap of the six unit cell parameters between the space groups and the structure assignments were all conclusive.

Figure S7
Figure S7 Pearson correlation plot of unit cell parameters from processing in mixture B and the reference unit cell parameters.

Figure S8
Figure S8 Pearson correlation plot of unit cell parameters from processing in mixture C and the reference unit cell parameters.

Figure S9
Figure S9Pearson correlation plot of unit cell parameters from processing in aspirin tablet and the reference unit cell parameters.

Figure S10
Figure S10Pearson correlation plot of unit cell parameters from processing in acetaminophen tablet and the reference unit cell parameters.

Table S1
Statistics from the various steps of SerialEM data collection for Mixture A-C.

Table S2
Statistics from the various steps of SerialEM data collection for Aspirin tablet and Acetaminophen tablet.Sets of undefined status were typically diffraction patterns not good enough for unit cell determination.This group likely contains several components of the nonactive ingredients with undeclared diffraction properties such as starch, carnauba wax, FD&C yellow no.6 aluminum lake and flavors.Starch alone is reported to have several crystalline states with various degrees of crystallinity and the group with sets of undefined status was therefore removed from evaluation as a whole.

Table S3
Input composition and the number of crystals identified in mixture A.TableS4Input composition in mixture B and C.Table S5Ingredients composition in Aspirin and Acetaminophen tablet.acompositionwasobtainedfrom the "Drug facts" declared on the packages.bAspirintabletcontains81mgaspirin(activeingredient)and others (inactive ingredients) including corn starch, dextrose excipient, FD&C yellow no.6 aluminum lake, flavors, saccharin sodium.cAcetaminophentabletcontains500mgacetaminophen(activeingredient)and others (inactive ingredients) including carnauba wax, hypromellose, polyethylene glycol, povidone, pregelatinized starch, stearic acid.dmaycontainoneormore of these ingredients: corn starch, croscarmellose sodium, sodium starch glycolate.Table S7Compositional analysis in Aspirin and Acetaminophen tablet.a'No. of xtals' represents the number of crystals identified for each component.bthecrystalareaswereconvertedfrompixel 2 to µm 2 by the pixel sizes and magnification.cRatioObs was calculated by dividing the ratio of the total area of one component to the total area of all crystals.dcontainstwopolymorphs of aspirin.TableS8MicroED refinement statistics for the crystals structures solved from the automatic SerialEM data collection in Mixture C-the mixture containing amino acids, Aspirin tablet, and Acetaminophen tablet.Structures were solved ab initio by SHELXT and refined by SHELXL following automatic processing by the implemented Python script used in this study.Table S9Reference unit cell parameters for the compounds in Mixture A-C.Table S10Reference unit cell parameters for the compounds in Aspirin and Acetaminophen tablet.
TableS6Compositional analysis in mixture B and C. a 'No. of xtals' represents the number of crystals identified for each component.b the crystal areas were converted from pixel 2 to µm 2 by the pixel sizes and magnification.c RatioObs was calculated by dividing the ratio of the total area of one component to the total area of all crystals.e contains crystals of α-D-glucose, β-D-glucose and α-D-glucose monohydrate.a Completeness of the merged datasets by XSCALE.b R1 from SHELXL refinement.